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Abstract 

Constructing generative models for functional observations is an important task in statistical functional analysis. In 
general, functional data contains both phase (or x or horizontal) and amplitude (or y or vertical) variability. Tradi- 
tional methods often ignore the phase variability and focus solely on the amplitude variation, using cross-sectional 
techniques such as fPCA for dimensional reduction and data modeling. Ignoring phase variability leads to a loss of 
structure in the data and inefficiency in data models. This paper presents an approach that relies on separating the phase 
(x-axis) and amplitude (y-axis), then modeling these components using joint distributions. This separation, in turn, is 
performed using a technique called elastic shape analysis of curves that involves a new mathematical representation 
of functional data. Then, using individual fPCAs, one each for phase and amplitude components, while respecting 
the nonlinear geometry of the phase representation space; impose joint probability models on principal coefficients 
of these components. These ideas are demonstrated using random sampling, for models estimated from simulated 
and real datasets, and show their superiority over models that ignore phase-amplitude separation. Furthermore, the 
generative models are applied to classification of functional data and achieve high performance in applications involv- 
ing SONAR signals of underwater objects, handwritten signatures, and periodic body movements recorded by smart 
phones. 

Keywords: amplitude variability, function alignment, function principal component analysis, functional data 
analysis, generative model, phase variability 



1. Introduction 

The problem of statistical analysis and modeling of data in function spaces is important in applications arising 
in nearly every branch of science, including signal processing, geology, biology, and chemistry. The observations 
here are time samples of real-valued functions on an observation interval, and to perform effective data analysis it 
is desirable to have a generative, probabilistic model for these observations. The model is expected to properly and 
parsimoniously characterize the nature and variability in the data. It should also lead to efficient procedures for 
conducting hypothesis tests, performing bootstraps, and making forecasts. An interesting aspect of functional data is 
that underlying variability can be ascribed to two sources. In a sample data the given functions may not be perfectly 
aligned and the mechanism for alignment is an important topic of research. The variability exhibited in functions after 
alignment is termed the amplitude (or y or vertical) variability and the warping functions that are used in the alignment 
are said to capture the phase (or x or horizontal) variability. A more explicit mathematical definition of amplitude and 
phase variability will be made in Section 2. It is imperative that any technique for analysis or modeling of functional 
data should take both these variabilities into account. 

1.1. Need for Phase-Amplitude Separation 

Many of the current methods for analyzing functional data ignore the phase variability. They implicitly assume 
that the observed functions are already temporally aligned and all the variability is restricted only to the y-axis. 
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A prominent example of this situation is functional principal component analysis (fPCA) (see e.g., Ramsay and 
Silverman (2005)) that is used to discover dominant modes of variation in the data and has been extensively used in 
modeling functional observations. If the phase variability is ignored, the resulting model may fail to capture patterns 
present in the data and will lead to inefficient data models. 

Fig. 1 provides an illustration of this using simulated functional data. This data was simulated using the equation 
yi (t) = Zie- ( - t - a > )2/2 , t € [-6,6], i = 1,2,..., 21, where zt is lid. Ml, (0.05) 2 ) and a t is lid. Af(0,(1.25) 2 ). The 
top-left plot shows the original data; each sample function here is a unimodal function with slight variability in height 
and a large variability in the peak placement. One can attribute different locations of the peak to the phase variability. 
If one takes the cross-sectional mean of this data, ignoring the phase variability, the result is shown in the top-middle 
plot. The unimodal structure is lost in this mean function with large amount of stretching. Furthermore, if one 
performs fPCA on this data and imposes the standard independent normal models on fPCA coefficients (details of this 
construction are given later), the resulting model will lack this unimodal structure. Shown in the top-right plot are 
random samples generated from such a probability model on the function space where a Gaussian model is imposed 
on the fPCA coefficients. These random samples are not representative of the original data; the essential shape of the 
function is lost, with some of the curves having two, three, or even more peaks. 

The reason why the underlying unimodal pattern is not retained in the model is that the phase variability was 
ignored. We argue that a proper technique is to separate the phase and amplitude variability, using techniques for 
functional alignment, and then develop a probability model for each component. While postponing details for later, 
we show results obtained by a separation-based approach in the bottom row. The mean of the aligned functions is 
shown in the bottom left panel of Fig. 1 . Clearly retained is the unimodal structure and so do the random samples 
under a framework that model the phase and amplitude variables individually. Some random samples are shown in 
the bottom right panel, these displays are simply meant to motivate the framework and the mathematical details are 
provided later in the paper. This example clearly motivates the need for function alignment for modeling functional 
data that contains phase variability. 



1.5 




(a) Original Data (b) Original Mean (c) Original Random Samples 




(d) Aligned Mean (e) Samples from Proposed Models 

Figure 1 : Samples drawn from a Gaussian model fitted to the principal components for the unaligned and aligned data. 

1.2. Past Literature on Phase-Amplitude Separation 

This brings up an important question: How to separate the phase and amplitude components in a given dataset? 
While this is a topic of ongoing research, a number of techniques have already been discussed in the literature. The 
main difference between them lies in the choice of the cost function used in the alignment process. The different 
cost functions suggested in the statistics literature including area-under-the-curve matching (Liu and Muller (2004); 
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Tang and Muller (2008)), minimum second-eigenvalue (MSE) (Kneip and Ramsay (2008)), moment based matching 
(MBM) (James (2007)), self-modeling registration (Gervini and Gasser (2004)) and £-means alignment and clustering 
(Sangalli et al. (2010a,b)). 

In the meantime several other communities, often outside statistics, have studied registration of functions in one 
or higher dimensions, e.g., in matching MRI images (Beg et al. (2005); Christensen and Johnson (2001); Tagare 
et al. (2009)), shape analysis of curves (Joshi et al. (2007); Klassen et al. (2004); Kurtek et al. (2012); Michor and 
Mumford (2006); Srivastava et al. (2011a); Younes (1998); Younes et al. (2008)), shape analysis of surfaces (Kurtek 
et al. (2011a)), etc. The problem of curve registration is especially relevant for phase-amplitude separation needed 
in functional data analysis since the case for 1R 1 is essentially that for real valued functions! We will adapt a shape- 
analysis approach that has been termed elastic shape analysis (Joshi et al. (2007); Kurtek et al. (2012); Srivastava et al. 
(2011a)). Although these methods have been developed for alignment of curves in W, their application to functional 
data analysis has been explained in Kaziska (2011); Kurtek et al. (2011b); Srivastava et al. (2011b). The basic idea 
in this method is to introduce a mathematical representation, called the square-root slope function or SRSF (details in 
the next section) that improves functional alignment and provides fundamental mathematical equalities that leads to a 
formal development of this topic. 

The theoretical superiority of the elastic method comes from the following fact: the alignment of functions is 
based on a cost term that is a proper distance. Thus satisfying all desired properties in alignment, such as symmetry 
(optimal alignment of / to g is same as that of g to /), positive definiteness (the cost term between any two functions 
is nonnegative and it equals zero if and only if one can be perfectly aligned to the other), and the triangle inequality. 
None of the current methods in the statistics literature (e.g., Gervini and Gasser (2004); James (2007); Kneip and 
Ramsay (2008); Liu and Muller (2004); Tang and Muller (2008)) have these properties. In fact, most of them are 
not even symmetric in their alignment. Additionally, many past methods perform component separation and fPCA 
in two distinct steps, under different metrics, while in elastic shape analysis it is performed jointly under the same 
metric. In addition to these theoretical advantages, we have also emphasized the experimental superiority of elastic 
curve analysis using a large number of datasets in this paper. 

Another important issue, encountered in modeling phase variability, is to characterize the geometry of the phase 
space. Generally speaking, phase variability is represented by a warping function y that satisfies certain properties 
such as boundary conditions, invertibility, smoothness, and smoothness of its inverse. Ramsay and Silverman (2005) 
represent y using a basis set in the log-derivative space, i.e., log(y(f)) = aibiit). Some others force y to be a 
piecewise linear function with positive derivatives (Liu and Muller (2004)) and even linear functions (Sangalli et al. 
(2010a,b)). It becomes clear that boundary conditions, combined with the smoothness and invertibility requirements, 
restrict the set of allowable warping functions to a nonlinear space. Although it seems natural, the use of nonlinear 
geometry of this set in establishing a metric for comparing warping functions and for performing fPCA has seldom 
been studied in the functional data analysis literature. Srivastava et al. (2011a) have studied a square-root derivative 
representation, similar to the one suggested by Bhattacharya (1943), for converting this set into a sphere and analyzing 
warping functions as elements of a Hilbert sphere. The paper Srivastava et al. (2007) demonstrates the advantages of 
using square-root derivative over the log-derivative representation of warping functions. 

1.3. Proposed Framework 

After the separation of phase and amplitude components, we will define two types of distances. One is a y-distance, 
defined to measure amplitude differences between any two functions (and independent of their phase variability) and 
computed as the L 2 distance between the SRSFs of the aligned functions. The other is an jc-distance, or the distance 
on their phase components, that measures the amount of warping needed to align the functions. We will show that 
either of these distances provides useful measures for computing summary statistics, for performing fPCA, and for 
discriminating between function classes. 

The main contribution of this paper is a modeling framework to characterize functional data using phase and 
amplitude separation. The basic steps in this procedure are: 1) Align the original functional data and obtain the aligned 
functions (describing amplitude variability) and the warping functions (describing phase variability). 2) Estimate the 
sample means and covariance on the phase and amplitude, respectively. This step uses a nonlinear transformation on 
the data to enable use of L 2 norm (and cross-sectional computations) for generating summary statistics (see Section 3); 
3) Based on the estimated summary statistics, perform fPCA on the phase and amplitude, respectively; 4) Model 
the original data by using joint Gaussian or non-parametric models on both fPCA representations; 5) As a direct 
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application, the model can be used to perform likelihood-based classification of functional data. We will illustrate this 
application using several data sets which include a simulated data set, a signature data set from Yeung et al. (2004), 
an iPhone action data set from McCall et al. (2012), and a SONAR data set. 

The rest of this paper is organized as follows: Section 2 presents the differential geometric approach for phase 
and amplitude separation adapted from Joshi et al. (2007); Srivastava et al. (2011a) and explained in Kurtek et al. 
(2011b); Srivastava et al. (2011b). Section 3 presents the functional principal component analysis of these phase and 
amplitude components, and statistical modeling of their principal coefficients. These modeling results are presented 
in Section 4. Section 5 describes classification of functional data using the developed models on real data sets, and 
compares results with some conventional methods. Finally, conclusions and observations are offered in Section 6. We 
have developed and R package f dasrvf implementing the proposed functional alignment and fPCA method Tucker 
(2012); this package is available on the CRAN archive. 

2. Phase and Amplitude Separation Using Elastic Analysis 

In this section, we adapt a method introduced for elastic shape analysis of curves in Joshi et al. (2007); Srivastava 
et al. (201 la) to the problem of functional data alignment. The details are presented in companion papers Kurtek et al. 
(2011b); Srivastava et al. (2011b). This comprehensive framework addresses three important goals: (1) completely 
automated alignment of functions using nonlinear time warping, (2) separation of phase and amplitude components of 
functional data, and (3) derivation of individual phase and amplitude metrics for comparing and classifying functions. 
For a more comprehensive introduction to this theory, including asymptotic results and estimator convergences, we 
refer the reader to these two papers as we will only present the algorithm here. 

2.1. Mathematical Representation of Functions 

Let / be a real -valued function with the domain [0,1]; the domain can easily be transformed to any other interval. 
For concreteness, only functions that are absolutely continuous on [0, 1] will be considered; let T denote the set of 
all such functions. In practice, since the observed data are discrete, this assumption is not a restriction. Also, let 
r be the set of boundary-preserving diffeomorphisms of the unit interval [0,1]: F = {y : [0, 1] — > [0, 1]| y(0) = 
0, y(l) = 1, y is a diffeomorphism). Elements of F play the role of warping functions. For any / eT and y e F, the 
composition / o y denotes the time-warping of / by y. With the composition operation, the set T is a group with the 
identity element y,d(f) = t. This is an important observation since the group structure of Y is seldom utilized in past 
papers on functional data analysis. 

In a pairwise alignment problem, the goal is to align any two functions f\ and fx. A majority of past methods use 
cost terms of the type (inf yE r ll/i — /2°7ll) to perform this alignment. Here || • || denotes the standard L 2 norm. However, 
this alignment is neither symmetric nor positive definite. To address this and other related problems, Srivastava et al. 
(2011a) introduced a mathematical expression for representing a function. This function, q : [0, 1] — > E, is called the 
square-root slope function or SRSF of /, and is defined in the following form: 

?(0 = sign(/(0)^7wi. 

It can be shown that if the function / is absolutely continuous, then the resulting SRSF is square-integrable (see 
Robinson (2012) for a proof). Thus, we will define L 2 ([0, 1], R), or simply L 2 , to be the set of all SRSFs. For every 
q e L 2 and a fixed t e [0, 1], the function / can be obtained precisely using the equation: /(f) = /(0) + L q(s)\q(s)\ds, 
since q{s)\q(s)\ = f(s). Therefore, the mapping from T to L 2 x R, given by / h-> (q, f(Q)) is a bijection (see Robinson 
(2012)). The most important property of this framework is the following. If we warp a function / by y, the SRSF of 
/ o y is given by: q(t) = (q,y)(t) = q(y(t)) ^y{t). With this expression it can be shown that for any f\,f% e T and 
y e T, we have 

ll?i-94ll = ll(?i,y)-(fc,y)ll, (2.1) 

where q\,q2 are SRSFs of f , /2, respectively. This is called the isometry property and it is central in suggesting a new 
cost term for pairwise registration of functions: inf yE r ll<7i - (qi, 7)11- This equation suggests we can register (or align) 
the SRSFs of any two functions first and then map them back to T to obtain registered functions. The advantage of 
this choice is that it is symmetric, positive definite, and satisfies the triangle inequality. Technically, it forms a proper 
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distance 1 on the quotient space Lr/F. We mention that this cost function has a built-in regularization term and does 
not require any additional penalty term. Please refer to papers Kurtek et al. (201 lb); Srivastava et al. (201 lb) for more 
details. In case one wants to control the amount of warping or elasticity this can be done as described in Wu and 
Srivastava (2011). 

The isometric property in Eqn. 2. 1 leads to a distance between functions that is invariant to their random warpings: 

Definition 1 (Amplitude or y-distance). For any two functions f\, fi^T and the corresponding SRSFs, q\,q2 e L 2 , 
we define the amplitude or the y-distance D y to be: 

D y (f u f2)=mf\\ qi -(q 2 c 7 )Jf)\\. 

It can be shown that for any j\ , yi e F, we have D y {f\ ° yi,fi °yi) = A > fz)- 
Optimization Over F: The minimization over F can be performed in many ways. In case F is represented by a 
parametric family, then one can use the parameter space to perform the estimation as Kneip and Ramsay (2008). 
However, since F is a nonlinear manifold, it is impossible to express it completely in a parametric vector space. In this 
paper we use the standard Dynamic Programming algorithm (Bertsekas (1995)) to solve for an optimal y. It should 
be noted that for any fixed partition of the interval [0, 1], this algorithm provides the exact optimal y that is restricted 
to the graph of this partition. 

2.2. Karcher Mean and Function Alignment 

In order to separate phase and amplitude variability in functional data, we need a notion of the mean of functions. 
Basically, first we compute a mean function and in the process warp the given functions to match the mean function. 
Since we have a proper distance in D y , in the sense that it is invariant to random warping, we can use that to define 
this mean. For a given collection of functions f\ ,f2,..., f n , \&tq\,q 2 ,...,q n denote their SRSFs, respectively. Define 
the Karcher mean of the given function as a local minimum of the following cost functions: 

// 

H f = argminV D y <Jjtf (2.2) 
ft* ti 

H q = argminY inf||<7 -(<?,■, y,)ll 2 • ( 2 - 3 ) 

(This Karcher mean has also been called by other names such as the Frechet mean, intrinsic mean or the centroid.) 
These are two equivalent formulations, one in the function space T and other in the SRSF space L 2 , i.e., fi q = 
sign(/i /) yj\fif\. Note that if fif is a minimizer of the cost function, then so is fx/ o y for any y e F since D y is invariant 
to random warpings of its input variables. So, we have an extra degree of freedom in choosing an arbitrary element of 
the set {fif o y\y e F}. To make this choice unique, we can define a special element of this class as follows. Let {y*} 
denote the set of optimal warping functions, one for each i, in Eqn. 2.3. Then, we can choose the fif to that element 
of its class such that the mean of {y*}, denoted by y„, is y i( j, the identity element. (The notion of the mean of warping 
functions and its computation are described later in Section 3.1 and summarized in Algorithm 2). The algorithm for 
computing the Karcher mean fif of SRSFs is given in Algorithm 1, where the iterative update in Steps 2-4 is based on 
the gradient of the cost function given in Eqn. 2.3. 
This procedure results in three items: 

1. fi q , preferred element of the Karcher mean class \{p q ,y)\y e F), 

2. {§,}, the set of aligned SRSFs, and 

3. {y*}, the set of optimal warping functions. 



'We note that restriction of L 2 metric to SRSFs of functions whose first derivative is strictly positive, e.g., cumulative distribution functions, is 
exactly the classical Fisher-Rao Riemannian metric used extensively in the statistics community Amari (1985); Cencov (1982); Efron (1975); Kass 
and Vos (1997); Rao (1945). 
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Algorithm 1: Phase- Amplitude Separation 

1. Compute SRSFs q\,qi, ... ,q„ of the given f\,fz,.. .,f„ and select// = qi, where i = argmin l£(£n \\q~\ 2" =1 qj\\- 

2. For each qi find the y* such that y* = argmin yEr - (q, o y) Vyll). The solution to this optimization comes 
from the dynamic programming algorithm. 

3. Compute the aligned SRSFs using i-> (q, o y*) yjy 7 *. 

4. If the increment ||^ £"=i Qi ~ lA\ is small, then continue. Else, update the mean using fx h-> - Yh=i Qi an< ^ remrn 
to step 2. 

5. The function fi represents a whole equivalence class of solutions and now we select the preferred element n q of 
that orbit: 

(a) Compute the mean y^ of all {y*} (using Algorithm 2). Then compute fi q = (pi o y^ 1 ) Jy^. 

(b) Update y* Hy'oy^ 1 . Then compute the aligned SRSFs using qi i-> {q- t o y*) Jy*. 



From the aligned SRSFs, one can compute individual aligned functions using: fi(t) = /j(0) + C qj(s)\qi(s)\ ds. 

To illustrate this method we run the algorithm on the data previously used in Kneip and Ramsay (2008). The 
individual functions are given by: y,(f) = z^ie^' -1 ' 5 ' + Zi,ie^ ,+l ^ ^ 2 , t e [-3,3], i = 1,2,... ,21, where z,,i and z,,2 
are i.i.d. 7V(1, (0.25) 2 ). (Note that although the elastic framework was developed for functions on [0, 1], it can easily 
be adapted to an arbitrary interval). Each of these functions is then warped according to: y,(f) = 6(^^— -J - 3 if 
a, + 0, otherwise y, = y^ (yid(t) — t is the identity warping). Here a, are equally spaced between -1 and 1, and the 
observed functions are computed using jt ; (f) = yi°Ji{i). A set of 21 such functions forms the original data and is shown 
in Panel (d) of Fig. 2 with corresponding SRSFs in Panel (a). Panel (b) presents the resulting aligned SRSFs using our 
method {§,} and Panel (c) plots the corresponding warping functions {y*}. The corresponding aligned functions {//} is 
shown in Panel (e). It is apparent that the plot of {/] } shows a tighter alignment of functions with sharper peaks and 
valleys, and thinner band around the mean. This indicates that the effects of warping generated by the y,s have been 
completely removed and only the randomness from the y,-s remain. 




-2 2 -2 2 



(d) Original Data {fi} (e) Warped Data ifi} 

Figure 2: Alignment of the simulated data set using Algorithm 1. 
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We also compare the performance of Algorithm 1 with some published methods including; the MBM method 
of James (2007) and the MSE method of Ramsay and Silverman (2005) on a more difficult simulated data and a 
real SONAR data set. The original simulated data are shown in Fig. 3(a) and the data consists of 39 unimodal 
functions which have been warped with equally-spaced centers along the x-axis and have slight variation in peak- 
heights along the y-axis. Fig. 3(b)-(d) present the alignment results for our elastic method, the MBM method, and the 
MSE method, respectively. The original SONAR data are shown in Fig. 3(e) and the data consists of 131 measured 
SONAR signals that contain measurement ambiguity. Fig. 3(f)-(h) present the alignment results for our elastic method, 
the MBM method, and the MSE method, respectively. For the simulated data the elastic method performs the best 
while the MBM method performs fairly well with a little higher standard deviation. The MBM method and the MSE 
method both have a few numerical issues that lead to blips in the functions. For the SONAR data only the elastic 
method performs well, as MBM and MSE methods fail to align the data at all. We can also quantify the alignment 
performance using the decrease in the cumulative cross-sectional variance of the aligned functions. For any functional 
dataset {#,(?), i = 1, 2, . . . , n, t e [0, 1]}, let 



Var(te}) = ^- f f U)--tgi(f) 
n-l Jo £A n tt ) 



2 

dt , 



denote the cumulative cross-sectional variance in the given data. With this notation, we define: 

Original Variance = Var( \fi\), Amplitude Variance = Var({/)}), Phase Variance = Var({//y o y ; }) . 

The phase- and amplitude-variances for the different alignment algorithms shown in Fig. 3 is listed below in Table 1 
with the simulated unimodal data on the top two rows and the SONAR data on the bottom two rows: Based on its 
superior performance and theoretical advantages, we choose the elastic method for separating the phase and amplitude 
components. For additional experiments and asymptotic analysis of this method, please refer to Kurtek et al. (201 lb); 
Srivastava et al. (201 lb). 



7 



Data Component Orij 


rinal Variance Elastic Method 


MBM MSE 


Unimodal Amplitude-variance 


4.33 


0.004 


0.23 .02 


Phase-variance 





4.65 


4.31 4.54 


SONAR Data Amplitude-variance 


2.89e-5 


1.53e-5 


3.02e-5 2.42e-5 


Phase-variance 





1.48e-5 


1.30e-5 1.36e-5 



Table 1 : The comparison of the amplitude variance and phase variance for different alignment algorithms on the Unimodal and SONAR data set. 




Figure 4: Depiction of the SRSF space of warping functions as a sphere and a tangent space at identity if/jj. 



3. Analysis and Modeling of Components 

Having separated functional data into phase and amplitude components, we focus on the task of developing their 
generative models. 

3.1. Phase-Variability: Analysis of Warping Functions 

First, we would like to study the phase-variability of the given functions, available to us in the form of the warping 
functions {y*} resulting from Algorithm 1. An explicit statistical modeling of the warping functions can be of interest 
to an analyst since they represent the phase-variability of the original data. As mentioned earlier, the space of warping 
functions, T, is a nonlinear manifold and cannot be treated as a Hilbert space directly. Therefore, we will use tools 
from differential geometry to be able to perform statistical analysis and modeling of the warping functions. This 
framework has been used previously but in different application areas, e.g., modeling parameterizations of curves 
Srivastava and Jermyn (2009) and studies of execution rates of human activities in videos Veeraraghavan et al. (2009). 
It also relates to the square-root representation of probability densities introduced by Bhattacharya (1943). 

Let j\, 72, • • • , Jn e F be a set of observed warping functions. Our goal is to develop a probability model on Y that 
can be estimated from the data directly. There are two problems in doing this in a standard way: (1) F is a nonlinear 
manifold, and (2) it is infinite dimensional. The issue of nonlinearity is handled using a convenient transformation 
which coincidentally is similar to the definition of SRSF, and the issue of infinite dimensionality is handled using 
dimension reduction, e.g., fPCA, which we will call horizontal fPCA. We are going to represent an element y e T by 
the square-root of its derivative tfr — Vy. Note that this is the same as the SRSF defined earlier for /is and takes this 
form since y > 0. The identity map y^ maps to a constant function with value ij/idif) = 1- Since y(0) = 0, the mapping 
from y to ip is a bijection and one can reconstruct y from \j/ using yit) = f if/(s) 2 ds. An important advantage of this 
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transformation is that since ||i^|| 2 = L ifr(t) 2 dt = C j(t)dt = y(l) - y(0) = 1, the set of all such ifrs is a Hilbert sphere 
§00, a unit sphere in the Hilbert space L 2 . In other words, the square-root representation simplifies the complicated 
geometry of F to a unit sphere. The distance between any two warping functions is exactly the arc-length between 
their corresponding SRSFs on the unit sphere Soo: 

D x {y\,72) = dptyufo) = cos' 1 m i/fi(t)i/f 2 (t)dtj . 

Fig. 4 shows an illustration of the SRSF space of warping functions as a unit sphere. 

The definition of a distance on Soo helps define a Karcher mean of sample points on §«,. 

Definition 2. For a given set of points \p\ , if/2, ■ ■ ■ , ij/ n £ Soo, their Karcher mean in Soo is defined to be a local minimum 
of the cost function 1// 1— > £Li dy,(iff,i//i) 2 . 

Now we can define the Karcher mean of a set of warping functions using the Karcher mean in Soo- For a given set 
of warping functions 71,72, •■ • ,7« £ T, their Karcher mean in T is 7(f) = J ^{s) 2 ds where ^ is the Karcher mean 
of VtT, Vt2' • • • > Vt« m §M ■ The search for this minimum is performed using Algorithm 2: 

Algorithm 2: Karcher Mean of Warping Functions 

Let if/, = Vt7 be the SRSFs for the given warping functions. Initialize p.^ to be one of the if/jS or simply w/||w||, where 

*" = ; Hi 

1. For i — 1,2, ... ,n, compute the shooting vector v* = , (fa - cos(#,)yU^), 6, - cos -1 (J^p^,, By definition, 
each of these v; e 7), (§ M )- 

2. Compute the average v = i 2"=i v, s r w (§ M ). 

3. If ||v|| is small, then continue. Else, update ^ i-> cos(e||v||)/i^ + sin(e||v||)pjj, for a small step size e > and 
return to Step 1 . 

4. Compute the mean warping function using y(t) = JT /u^(s) 2 ds. Stop. 



Since Soo is a nonlinear space (a sphere), one cannot perform principal component analysis on it directly. Instead, 
we choose a vector space tangent to the space, at a certain fixed point, for analysis. The tangent space at any point 
if/ € Soo is given by: 7V(Soo) = {v e L 2 | j Q v(t)i//(t)dt = 0). In the following, we will use the tangent space at p.^ 
to perform analysis. Note that the outcomes of Algorithm 2 include the Karcher mean ^ and the tangent vectors 
{v,} € 7^ (Soo). These tangent vectors, also called the shooting vectors, are the mappings of into the tangent 
space 7^ (Soo), as depicted in Fig. 4. In this tangent space we can define a sample covariance function: (t\,t2) h-> 
~[ 27=1 v i(fi)v;te)- I n practice, this covariance is computed using a finite number of points, say T, on these functions 
and one obtains a T x T sample covariance matrix instead, denoted by K^. The singular value decomposition (SVD) 
of = U^^Vj provides the estimated principal components of {t/f,}: the principal directions U^j and the observed 

principal coefficients /v,-, UyjY This analysis on Soo is similar to the ideas presented in Srivastava et al. (2005) although 
one can also use the idea of principal nested sphere for this analysis Jung et al. (2012). 

As an example, we compute the Karcher mean of a set of random warping functions. These warping functions 
are shown in the left panel of Fig. 5 and their Karcher mean is shown in the second panel. Using the {v,}'s that result 
from Algorithm 2, we form their covariance matrix and take its SVD. The first three columns of U$ are used to 
visualize the principal geodesic paths in the third, fourth, and fifth panels. 

3.2. Amplitude Variability: Analysis of Aligned Functions 

Once the given observed SRSFs have been aligned using Algorithm 1, they can be statistically analyzed in a 
standard way (in L 2 ) using cross-sectional computations in the SRSF space. This is based on the fact that D v is the 
L 2 distance between the aligned SRSFs. For example, one can compute their principal components for the purpose 
of dimension reduction and statistical modeling using fPCA. Since we are focused on the amplitude-variability in this 
section, we will call this analysis vertical fPCA. 
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Let /i , • ■ • , f„ be a given set of functions, and q l , ■ ■ ■ ,q„be the corresponding SRSFs, p q be their Karcher Mean, 
and let g,s be the corresponding aligned SRSFs using Algorithm 1. In performing vertical fPCA, one should not 
forget about the variability associated with the initial values, i.e., {/i(0)}, of the given functions. Since representing 
functions by their SRSFs loses this initial value, this variable is treated separately. That is, a functional variable / is 
analyzed using the pair (q, /(0)) rather than just q. This way, the mapping from the function space T to L 2 x K is 
a bijection. In practice, where q is represented using a finite partition of [0,1], say with cardinality T, the combined 
vector hi - [qi fi(0)] simply has dimension (T + 1) for fPCA. We can define a sample covariance operator for the 
aligned combined vector h = [qi fi(0)] as 

1 " 

K h (s,t)= r y£[to-Aft(*))to-Aft(0)] e M (r+1)x(r+I) , (3.1) 

n - 1 

where yu/, = \±i q /(0)]. Taking the SVD, K/, = U^hVl we can calculate the directions of principle variability in the 
given SRSFs using the first p < n columns of Uh and can be converted back to the function space T, via integration, 
for finding the principal components of the original functional data. Moreover, we can calculate the observed principal 
coefficients as (hi, Uh,jj- 

One can then use this framework to visualize the vertical principal-geodesic paths. The basic idea is to compute a 
few points along geodesic path r i-> pi t +t JT^JjUhj for r e E in L 2 , where S^jj and Uhj are the f h singular value and 
column, respectively. Then, obtain principle paths in the function space T by integration as described earlier. Figure 
6 shows the results of vertical fPCA on the simulated data set from Fig. 2. It plots the vertical principal-geodesic paths 
of the SRSFs, q T j for r = -2,-1,0, 1,2 and j = 1,2,3 and the vertical principal-geodesic paths in function space. 
The first 3 singular values for the data are: 0.0481, 0.0307, and 0.0055 with the rest being negligibly small. The 
first principal direction corresponds to the height variation of the second peak while the second principal component 
captures the height variation of the first peak. The third principal direction has negligible variability. 

3.3. Modeling of Phase and Amplitude Components 

To develop statistical models for capturing the phase and amplitude variability, there are several possibilities. 
Once we have obtained the fPCA coefficients for these components we can impose probability on the coefficients 
and induce a distribution on the function space T ■ Here we explore two possibilities: a joint Gaussian model and a 
non-parametric model. 

Let c = (ci, . . . ,q,) and z = (zi, . . . ,Zjt 2 ) be the dominant principal coefficients of the amplitude- and phase- 
components, respectively, as described in the previous two sections. Recall that Cj = (h, Uh,jj and Zj = (v, U^^. We 
can reconstruct the amplitude component using q — p. q + c jUh,j an d f s (t) - f s (0) + J q(s)\q(s)\ds. Here, / s (0) is 
a random initial value. Similarly, we can reconstruct the phase component (a warping function) using v = £ ij ZjU^j 

and then using if/ = cos(||v||)/fy + sin(||v||)jj^jj, and y s (t) = tf/(s) 2 ds. Combining the two random quantities, we obtain 
a random function f o y s . 
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3.3.1. Gaussian Models on fPCA Coefficients 

In this setup the model specification reduces to the choice of models for / s (0), c, and z. We are going to model 
them as multivariate normal random variables. The mean of / 5 (0) is /(0) while the means of c and z are zero vectors. 



Their joint covariance matrix is of the type: 



covariance between /(0) and c, L 2 e 



s 



Here, L\ e 



)lX*i 



captures the 



L 2 

S £ ]j(*l+fe+l)X(*l+fe+l) 

2 " ^ - 

lxfe between /(0) and z, and S e R* lX * 2 between c and z. As discussed in the 
previous sections £/, e R* 1 ** 1 and € M. k2Xk2 are diagonal matrices and are estimated directly from the data. We will 
call this resulting probability model on the fPCA coefficients as pc aU ss- 



3.3.2. Non-parametric Models on fPCA Coefficients 

An alternative to the Gaussian assumption made above is the use of kernel density estimation Silverman (1998), 
where the density of / s (0), each of the k\ components of c, and the k 2 components of z can be estimated using 



where 7C( ) is the smoothing kernel, which is a symmetric function that integrates to 1, and b > is the smoothing 
parameter or bandwidth. A range of kernel functions can be used, but a common choice is the Gaussian kernel. 



4. Modeling Results 

We will now evaluate the models introduced in the previous section using random sampling. We will first estimate 
the means and the covariances from the given data, estimate the model parameters, and then generate random samples 
based on these estimated models. We demonstrate results on two simulated data sets used in Figs. 2 and 3 and one 
real data set being the Berkeley growth data 2 . For the first simulated data set, shown in Fig. 2, we randomly generate 
35 functions from the amplitude model and 35 domain-warping functions from the phase model and then combine 
them to generate random functions. The corresponding results are shown in Fig. 7, where the first panel is a set of 
random warping functions, the second panel is a set of corresponding amplitude functions, and the third panel shows 



2 http://www.psych.mcgill.ca/faculty/ramsay/datasets.html 
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their compositions. Comparing them with the original datasets (Fig. 2) we conclude that the random samples are very 
similar to the original data and, at least under a visual inspection, the proposed models are successful in capturing the 
variability in the given data. Furthermore, if we compare these sampling results to the fPCA-based Gaussian model 




0.5 1 -3 3 -3 3 -3 3 



Figure 7: Random samples from jointly Gaussian models on fPCA coefficients of y" (left) and f s (middle), and their combinations f s o y s (right) 
for Simulated Data 1. The last plot are random samples if a Gaussian model is imposed on / directly without any phase and amplitude separation. 

directly on / (without separating the phase and amplitude components) in the last panel of Fig. 7, we notice that our 
model is more consistent with the original data. A good portion of the samples from the non-separated model just 
contain three peaks or have a higher variation than the original data and some barely represent the original data. 

For the second simulated data set we use the data shown in Fig. 3 and perform vertical and horizontal fPCA. 
As before, we randomly generate 35 functions from the amplitude model and 35 domain-warping functions from 
the phase model and then combine them to generate random functions. The corresponding results are shown row of 
Fig. 8, where the first panel is a set of random warping functions, the second panel is a set of corresponding amplitude 
functions, and the last panel shows their compositions. Comparing them with the original data in Fig. 3 we conclude 
that the random samples are very similar to the original data and, under visual inspection, the proposed models are 
successful in capturing the variability in the given data. In this example performing fPCA directly on the function 
space does not correctly capture the data and fails to generate any single unimodal function shown in the last panel. 




0.5 1 0.5 1 0.5 1 0.5 1 



Figure 8: Random samples from jointly Gaussian models on fPCA coefficients of y s (left) and f s (middle), and their combinations f s o y s (right) 
for Simulated Data 2. The last panel shows the random samples resulting from a Gaussian model imposed on / directly. 

For the Berkley growth data we again develop our phase and amplitude models then randomly generate 35 func- 
tions from the amplitude model and 35 domain-warping functions from the phase model. Then combine them to 
generate random functions. The corresponding results are shown row of Fig. 9, where the first panel is a set of random 
warping functions, the second panel is a set of corresponding amplitude functions, and the last panel shows their 
compositions. Comparing them with the original data set in the last panel we conclude that the random samples are 
similar to the original data and, at least under a visual inspection, the proposed models are successful in capturing the 
variability in the given data. 

5. Classification Using Phase and Amplitude Models 

An important use of statistical models of functional data is in classification of future data into pre-determined 
categories. Since we have developed models for both amplitude and phase, one or both can be used for classification 
and analyzed for their performance. Here we use a classical setup: a part of the data is used for training and estimation 
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Figure 9: From left to right: Random samples from jointly Gaussian models on fPCA coefficients of y 5 and f", respectively, and their combinations 
f s o y s for the Berkley Growth Data. The last panel shows the original data used in this experiment. 

of model parameters while the remaining part is used for testing. This partition is often random and repeated many 
times to obtain an average classification performance. 

Amplitude-Based Classification: As described earlier, we can impose a probability model for the amplitude data 
using the principal subspace associated with the aligned SRSFs. The actual model is imposed on the principal coef- 
ficients {c\,C2, . . . ,q,), with respect to the basis Uh,\, Uf,,2, ■ ■ ■ , Uh,k\- These basis elements, in turn, are determined 
using the training data. We can select a k\ such that the cumulative energy Y^j = \ ^kjjl Zj=/ ^h,jj is above a certain 
threshold, e.g., 90 percent. There are two choices of models: Gaussian and kernel-density estimator. Classification 
is performed by constructing the appropriate models for each class C\, ■ • ■ , Cl of the data. Then, for a test sample 
hj e R r+I project it to the principal subspace using an orthonormal basis Um e IR (:r+1)x ' :i , one for each class, and 
calculate the likelihood under each class. The model with the largest likelihood represents the class assigned to hj. 
Therefore, our classification rule is: 

classify^) = argmax p a {lf u hj\K h i, p M ) , where p a = PGmss or p ker . (5.1) 
c, 

Phase-Based Classification: Similarly, for the phase components, we can represent the shooting vectors, {v,}, in a 
lower order dimensional space using the first kj columns of U^. Where hi can be chosen similar to k\ as described 
above. Once again, we can either define a Gaussian model or a kernel density estimator on these principal coefficients. 
We can estimate the model parameters for each class C\, ■ ■ ■ ,Cl using the training data. Then, for a test sample's 
shooting vector \>j, we project it to each model's subspace and calculate the likelihood of vj under each pre-determined 
class. Therefore, our classification rule is: 

classify^) = argmax p^UlvjlK^f) where p^ = p Causs or p ker . (5.2) 
c. 

Joint Classification: Assuming independence we can combine the amplitude and phase classification rules as, 

classify^, v,) = argmax paiUjfijlKhhPhdPyiUliVjlK^) (5.3) 

c, 

and classification is as described previously. 

In this section, we present the classification results on a signature data Yeung et al. (2004), an iPhone-generated 
action data set from McCall et al. (2012), and a SONAR data set using models developed using vertical and horizontal 
fPCA. 

5.7. Signature Data 

In this section, we test our classification method on a signature recognition data set from Yeung et al. (2004). The 
data was captured using a WACOM Intuos tablet. The data set consists of signature samples from 40 different subjects 
with 20 real signature samples of the subject and another 20 samples which are forgeries of the subject's signature. 
In our analysis we are going to distinguish between the real and forgery signature for two of the subjects using the 
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tangential acceleration. The tangential acceleration is computed as A(t) - yj[X"(t)] 2 + [Y"(t)] 2 . To have a robust esti- 
mate of the SRSF {<?,}, we first smooth the original functions 100 times {f,} using a standard box filter [1/4, 1/2, 1 /4]. 
That is, numerically we update the signals at each discrete point by fi(Xk) — > ( j/X-ty-i) + \fi{xk) + \fi(Xk+i)X The 
smoothed acceleration functions are aligned in each class (real vs. fake) using our alignment algorithm from Section 2. 
An example signature with 10 realizations is shown in Fig. 10 along with the corresponding acceleration functions for 
both the real and fake signatures, the corresponding aligned functions, and warping functions. 
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Figure 10: From left to right: the original signature samples for one of the subjects, the corresponding tangential acceleration functions for both 
the real and fake signatures, the corresponding aligned functions, and warping functions. 



Models were generated for the three classes as was outlined in Section 3.3 by performing vertical and horizontal 
fPCA on the aligned data and the warping functions, respectively. We then impose a multivariate Gaussian model, 
PGauss, on the reduced data for each class, it is assumed here that the cross-covariances L\ and are zero. The 
threshold to select the number of dimensions, k\ and k%, was set at 95%. Classification for the amplitude component 
only was performed as described in Section 5 using the classification rule in (Eqn. 5.1) and was evaluated using 5-fold 
cross-validation. Similarly, the classification rule in (Eqn. 5.2) were used for the phase component. Moreover, the 
joint classification was performed using (Eqn. 5.3). Table 2a presents the the mean and standard deviation (shown 
in parentheses) of the classification rates from the cross-validation for the three rules. As well as comparing to the 
standard L 2 where models were generated directly on the original data, dimension reduction with fPCA, and imposing 
a multivariate normal distribution. Corresponding results for another subject, U13 is presented in Table 2b. 



Gaussian Kernel Density Gaussian Kernel Density 



amplitude only 0.93 (0.07) 


0.78 (0.19) 


amplitude only 0.75 (0.14) 


0.78 (0.21) 


phase only 0.65(0.16) 


0.75 (0.09) 


phase only 0.50(0.01) 


0.50 (0.01) 


phase and amplitude 0.90 (0.05) 


0.80 (0.07) 


phase and amplitude 0.58 (0.11) 


0.60 (0.10) 


standard L, 2 0.60(0.14) 


0.55 (0.11) 


standard L 2 0.50(0.01) 


0.53 (0.06) 


(a) Subject Ul 




(b) Subject Ul 3 





Table 2: Mean classification rate and standard deviation (in parentheses) for 5-fold cross-validation on the signature data. 

The classification rates have a low standard deviation indicating good generalization, though we do have a little 
variation for the phase only model. For both subjects the amplitude only rule greatly outperforms both the phase only 
rule and the standard L 2 with the best performance of 93% and 75% for subjects Ul and U13, respectively. Since 
the phase only rule performs poorly combining it with the amplitude only rule brings down the overall performance. 
The alignment and modeling using a proper distance improves the overall classification performance of the data. To 
compare the results between pcauss and pk em , we classified the data again forming models using Pkem which was 
discussed in Section 5, where each of the k\ and £2 components has an estimated density using a kernel density 
estimator and independence is assumed. We used the Gaussian kernel function and the bandwidth was selected 
automatically based upon the data using the method presented by Botev et al. (2010). Classification using the three 
classification rules was performed using 5-fold cross-validation. Table 2a and Table 2b present the the mean and 
standard deviation of the classification rates from the cross-validation for the three rules as well as comparing to the 
standard L 2 . Models were generated directly on the original data using fPCA and the kernel density estimator for 
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Figure 1 1 : Original iPhone functions for the walking, jumping, and climbing activities in the first column (in corresponding descending order) with 
the corresponding aligned functions and warping functions in the second and third columns, respectively. 

subjects Ul and U13, respectively. We see an improvement in the phase only method for subject Ul and reduction in 
performance for the other methods, this suggest the warping functions have some non-Gaussian behavior. However, 
for subject U13 there is a minimal change between the Gaussian and kernel estimator. 

5.2. iPhone Action Data 

This data set consists of aerobic actions recorded from subjects using the Inertial Measurement Unit (IMU) on 
an Apple iPhone 4 smartphone. The IMU includes a 3D accelerometer, gyroscope, and magnetometer. Each sample 
was taken at 60Hz, and manually trimmed to 500 samples (8.33s) to eliminate starting and stopping movements and 
the iPhone is always clipped to the belt on the right hand side of the subject. There is a total of 338 functions for 
each measurement on the IMU and the actions recorded consisted of biking, climbing, gym bike, jumping, running, 
standing, treadmill, and walking. With the number of samples being 30, 45, 39, 45, 45, 45, 44, and 45, respectively for 
each action. For more information on the data set the reader is referred to McCall et al. (2012). For our experiments 
we used the accelerometer data in the x-direction. Again, to have a robust estimate of the SRSF {g,}, we first smooth 
the original signals 100 times {/j} using the standard box filter described in the previous section. As with the previous 
data set, the smoothed iPhone data are aligned in each class (activity) using our method. A selected subset of functions 
from three activities is shown in Fig. 11 along with corresponding aligned functions and warping functions. 

To perform the classification, models were generated for the 8 classes by performing vertical and horizontal fPCA 
on the aligned data and the warping functions then imposing a multivariate Gaussian on the reduced data for each 
class. The threshold to select the number of dimensions, k\ and kj, was set at 95%. Classification was performed 
as in the previous section. Table 3 presents the mean and standard deviation of the classification rates for the cross- 
validation for all three rules as well as comparing to the standard L 2 . The classification rates have a low standard 
deviation indicating good generalization. The phase only rule and the amplitude only rule, drastically out perform the 
standard L 2 with the combination providing the best performance at 62%. The alignment and modeling using a proper 
distance improves the overall classification performance of the data. We again used the kernel density estimator to 
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Gaussian Kernel Density 



amplitude only 


0.60 


(0.04) 


0.62 


(0.05) 


phase only 


0.34 


(0.06) 


0.35 


(0.06) 


phase and amplitude 


0.62 


(0.08) 


0.62 


(0.07) 


standard L 2 


0.12 


(0.02) 


0.12 


(0.02) 



Table 3: Mean classification rate and standard deviation (in parentheses) for 5-fold cross-validation on the iPhone data. 

compare the results with the Gaussian assumption and the results are presented in Table 3. Using the kernel density 
estimator we see only minor improvements in the phase only rule, suggesting the Gaussian assumption is sufficient 
for this data. 

5.3. SONAR Data 

The data set used in these experiments was collected at the Naval Surface Warfare Center Panama City Division 
(NSWC PCD) test pond. For a description of the pond and measurement setup the reader is referred to Kargl et al. 
(2010). The raw SONAR data was collected using a 1 - 30kHz LFM chirp and data was collected for nine proud 
targets that included a solid aluminum cylinder, an aluminum pipe, an inert 81mm mortar (filled with cement), a solid 
steel artillery shell, two machined aluminum UXOs, a machined steel UXO, a de-militarized \52mm TP-T round, a 
de-militarized 155mm empty projectile (without fuse or lifting eye), and a small aluminum cylinder with a notch. The 
aluminum cylinder is 2ft long with a Ift diameter; while the pipe is 2ft long with an inner diameter of Ift and 3/8 
inch wall thickness. 

The acoustic signals were generated from the raw SONAR data to construct target strength as a function of fre- 
quency and aspect angle. Due to the relatively small separation distances between the targets in the measurement 
setup, the scattered fields from the targets overlap. To generate the acoustic templates (i.e., target strength plot of 
frequency versus aspect), synthetic aperture sonar (SAS) images were formed and then an inverse imaging technique 
was used to isolate the response of an individual target and to suppress reverberation noise. A brief summary of 
this process is as follows: The raw SONAR data are matched filtered and the SAS image is formed using the co - k 
beamformer Soumekh (1999). The target is then located in the SAS image and is windowed around selected location. 
This windowed image contains the information to reconstruct the frequency signals associated with a given target via 
inverting the a> - k beamformer Khwaja et al. (2005) and the responses were aligned in rage using the known acquisi- 
tion geometry. For the nine targets, 2000 different data collections runs were done, and 1 102 acoustic color templates 
were generated using the method described above from the data set. From the acoustic color maps, one-dimensional 
functional data was generated by taking slices at aspect value of 0° and therefore generating 1102 data samples. We 
will apply our method to this SONAR data, where there are n — 1102 SONAR signals with nine target classes and 
the numbers of functions in the nine classes are {n,}^ =1 = {131, 144, 118, 118, 121,119, 120, 1 14, 117} and are sampled 
using 483 points. A selected subset of functions in each class from the original data is shown in Fig. 12. We observe 
that the original data are quite noisy, due to both the compositional and the additive noise, increasing variability within 
class and reducing separation across classes. This naturally complicates the task of target classification using SONAR 
signals. 

To again have a robust estimate of the SRSF {<?,}, we first smooth the original signals 25 times {/■} using the 
standard box filter described previously. As with the previous data sets, the smoothed SONAR data are aligned in 
each class using our method. Models were generated for the three classes by performing vertical and horizontal fPCA 
on the aligned data and the warping functions then, imposing a multivariate Gaussian on the reduced data for each 
class, with the aligned data shown in Fig. 13. The threshold to select the number of dimensions, k\ and k%, was set at 
90%. Table 4 presents the classification rates for the cross-validation for all three rules as well as comparing to the 
standard L 2 . 

The classification rates have low standard deviation indicating good generalization for the SONAR data. The phase 
only rule and the amplitude only rule out perform the standard L 2 with the combination providing the best performance 
at 54%. The alignment and modeling using a proper distance improves the overall classification performance of 
the data. We again used the kernel density estimator to compare the results with the Gaussian assumption and the 
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Figure 12: Original SONAR functions in each of the 9 classes. 



Gaussian Kernel Density 



amplitude only 


0.44 


(0.03) 


0.47 


(0.02) 


phase only 


0.42 


(0.02) 


0.43 


(0.02) 


phase and amplitude 


0.54 


(0.03) 


0.53 


(0.03) 


standard L 2 


0.33 


(0.01) 


0.34 


(0.02) 



Table 4: Mean classification rate and standard deviation (in parentheses) for 5-fold cross-validation on SONAR data. 



results are presented in Table 4. Using the kernel density estimator we see improvements in the classification results. 
However, nothing is a dramatic improvement suggesting the Gaussian assumption is sufficient for this data. 



6. Conclusions 

The statistical modeling and classification of functional data with phase variability is a challenging and compli- 
cated task. We have proposed a comprehensive approach that solves the problem of registering and modeling functions 
in a joint, metric -based framework. The main idea is to use an elastic distance to separate the given functional data 
into phase and amplitude components, and to develop individual models for these components. The specific models 
suggested in this paper use fPCA and imposition of either multivariate Gaussian or nonparametric models on the co- 
efficients. The strengths of these models are illustrated in two ways: random sampling and model-based classification 
of functional data. In the case of classification, we consider applications involving handwritten signatures, motion 
data collected using iPhones, and SONAR signals. We illustrate the improvements in classification performance when 
the proposed models involving separate phase and amplitude components are used. 
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Figure 13: Aligned and Smoothed SONAR functions in each of the 9 classes. 
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